Overview

Dataset statistics

Number of variables15
Number of observations13736
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.4 MiB
Average record size in memory489.2 B

Variable types

NUM8
CAT7

Warnings

State Name has constant value "13736" Constant
FIPS State Code has constant value "13736" Constant
Date has a high cardinality: 202 distinct values High cardinality
County Name has a high cardinality: 68 distinct values High cardinality
Georeferenced Latitude & Longitude has a high cardinality: 68 distinct values High cardinality
FIPS County Code is highly correlated with County CodeHigh correlation
County Code is highly correlated with FIPS County CodeHigh correlation
MA Children is highly correlated with MA IndividualsHigh correlation
MA Individuals is highly correlated with MA ChildrenHigh correlation
Georeferenced Latitude & Longitude is highly correlated with County NameHigh correlation
County Name is highly correlated with Georeferenced Latitude & LongitudeHigh correlation
Date is uniformly distributed Uniform
County Name is uniformly distributed Uniform
Georeferenced Latitude & Longitude is uniformly distributed Uniform
County Code has 202 (1.5%) zeros Zeros
FIPS County Code has 202 (1.5%) zeros Zeros

Reproduction

Analysis started2020-12-12 21:06:18.663235
Analysis finished2020-12-12 21:06:25.822896
Duration7.16 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

State Name
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
PA
13736 
ValueCountFrequency (%) 
PA13736100.0%
 
2020-12-12T16:06:25.877944image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:25.920480image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:25.962016image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length2
Min length2

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
P1373650.0%
 
A1373650.0%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter27472100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P1373650.0%
 
A1373650.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin27472100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
P1373650.0%
 
A1373650.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII27472100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
P1373650.0%
 
A1373650.0%
 

FIPS State Code
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
42
13736 
ValueCountFrequency (%) 
4213736100.0%
 
2020-12-12T16:06:26.025571image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:26.065605image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:26.107641image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length2
Min length2

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
41373650.0%
 
21373650.0%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number27472100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
41373650.0%
 
21373650.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common27472100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
41373650.0%
 
21373650.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII27472100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
41373650.0%
 
21373650.0%
 

Calendar Year
Real number (ℝ≥0)

Distinct18
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.415842
Minimum2003
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size107.4 KiB
2020-12-12T16:06:26.168193image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2003
5-th percentile2004
Q12007
median2011
Q32016
95-th percentile2019
Maximum2020
Range17
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.875387193
Coefficient of variation (CV)0.002423858405
Kurtosis-1.183897443
Mean2011.415842
Median Absolute Deviation (MAD)4
Skewness-0.0006827321443
Sum27628808
Variance23.76940028
MonotocityNot monotonic
2020-12-12T16:06:26.233249image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%) 
20158165.9%
 
20078165.9%
 
20168165.9%
 
20098165.9%
 
20178165.9%
 
20108165.9%
 
20188165.9%
 
20118165.9%
 
20198165.9%
 
20048165.9%
 
20128165.9%
 
20058165.9%
 
20138165.9%
 
20068165.9%
 
20148165.9%
 
20088165.9%
 
20034083.0%
 
20202722.0%
 
ValueCountFrequency (%) 
20034083.0%
 
20048165.9%
 
20058165.9%
 
20068165.9%
 
20078165.9%
 
20088165.9%
 
20098165.9%
 
20108165.9%
 
20118165.9%
 
20128165.9%
 
ValueCountFrequency (%) 
20202722.0%
 
20198165.9%
 
20188165.9%
 
20178165.9%
 
20168165.9%
 
20158165.9%
 
20148165.9%
 
20138165.9%
 
20128165.9%
 
20118165.9%
 

Calendar Month
Real number (ℝ≥0)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.50990099
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size107.4 KiB
2020-12-12T16:06:26.303310image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.467427648
Coefficient of variation (CV)0.5326390759
Kurtosis-1.230742105
Mean6.50990099
Median Absolute Deviation (MAD)3
Skewness-0.008151845947
Sum89420
Variance12.02305449
MonotocityNot monotonic
2020-12-12T16:06:26.362861image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
711568.4%
 
1211568.4%
 
411568.4%
 
1111568.4%
 
311568.4%
 
1011568.4%
 
211568.4%
 
911568.4%
 
111568.4%
 
811568.4%
 
610887.9%
 
510887.9%
 
ValueCountFrequency (%) 
111568.4%
 
211568.4%
 
311568.4%
 
411568.4%
 
510887.9%
 
610887.9%
 
711568.4%
 
811568.4%
 
911568.4%
 
1011568.4%
 
ValueCountFrequency (%) 
1211568.4%
 
1111568.4%
 
1011568.4%
 
911568.4%
 
811568.4%
 
711568.4%
 
610887.9%
 
510887.9%
 
411568.4%
 
311568.4%
 

Month Name
Categorical

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
February
1156 
September
1156 
November
1156 
March
1156 
July
1156 
Other values (7)
7956 
ValueCountFrequency (%) 
February11568.4%
 
September11568.4%
 
November11568.4%
 
March11568.4%
 
July11568.4%
 
August11568.4%
 
January11568.4%
 
December11568.4%
 
April11568.4%
 
October11568.4%
 
May10887.9%
 
June10887.9%
 
2020-12-12T16:06:26.434422image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:26.504483image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length6.193069307
Min length3

Overview of Unicode Properties

Unique unicode characters26
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e1264814.9%
 
r1040412.2%
 
u68688.1%
 
b57806.8%
 
a57126.7%
 
y45565.4%
 
c34684.1%
 
t34684.1%
 
m34684.1%
 
J34004.0%
 
o23122.7%
 
l23122.7%
 
A23122.7%
 
p23122.7%
 
n22442.6%
 
M22442.6%
 
O11561.4%
 
i11561.4%
 
h11561.4%
 
g11561.4%
 
s11561.4%
 
F11561.4%
 
N11561.4%
 
v11561.4%
 
D11561.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter7133283.9%
 
Uppercase Letter1373616.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
J340024.8%
 
A231216.8%
 
M224416.3%
 
O11568.4%
 
F11568.4%
 
N11568.4%
 
D11568.4%
 
S11568.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1264817.7%
 
r1040414.6%
 
u68689.6%
 
b57808.1%
 
a57128.0%
 
y45566.4%
 
c34684.9%
 
t34684.9%
 
m34684.9%
 
o23123.2%
 
l23123.2%
 
p23123.2%
 
n22443.1%
 
i11561.6%
 
h11561.6%
 
g11561.6%
 
s11561.6%
 
v11561.6%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin85068100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1264814.9%
 
r1040412.2%
 
u68688.1%
 
b57806.8%
 
a57126.7%
 
y45565.4%
 
c34684.1%
 
t34684.1%
 
m34684.1%
 
J34004.0%
 
o23122.7%
 
l23122.7%
 
A23122.7%
 
p23122.7%
 
n22442.6%
 
M22442.6%
 
O11561.4%
 
i11561.4%
 
h11561.4%
 
g11561.4%
 
s11561.4%
 
F11561.4%
 
N11561.4%
 
v11561.4%
 
D11561.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII85068100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e1264814.9%
 
r1040412.2%
 
u68688.1%
 
b57806.8%
 
a57126.7%
 
y45565.4%
 
c34684.1%
 
t34684.1%
 
m34684.1%
 
J34004.0%
 
o23122.7%
 
l23122.7%
 
A23122.7%
 
p23122.7%
 
n22442.6%
 
M22442.6%
 
O11561.4%
 
i11561.4%
 
h11561.4%
 
g11561.4%
 
s11561.4%
 
F11561.4%
 
N11561.4%
 
v11561.4%
 
D11561.4%
 

Date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct202
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
04/01/2006 12:00:00 AM
 
68
01/01/2016 12:00:00 AM
 
68
01/01/2020 12:00:00 AM
 
68
06/01/2019 12:00:00 AM
 
68
09/01/2006 12:00:00 AM
 
68
Other values (197)
13396 
ValueCountFrequency (%) 
04/01/2006 12:00:00 AM680.5%
 
01/01/2016 12:00:00 AM680.5%
 
01/01/2020 12:00:00 AM680.5%
 
06/01/2019 12:00:00 AM680.5%
 
09/01/2006 12:00:00 AM680.5%
 
04/01/2013 12:00:00 AM680.5%
 
05/01/2009 12:00:00 AM680.5%
 
10/01/2012 12:00:00 AM680.5%
 
03/01/2007 12:00:00 AM680.5%
 
12/01/2004 12:00:00 AM680.5%
 
09/01/2010 12:00:00 AM680.5%
 
12/01/2011 12:00:00 AM680.5%
 
06/01/2014 12:00:00 AM680.5%
 
11/01/2019 12:00:00 AM680.5%
 
01/01/2017 12:00:00 AM680.5%
 
02/01/2020 12:00:00 AM680.5%
 
04/01/2015 12:00:00 AM680.5%
 
07/01/2015 12:00:00 AM680.5%
 
11/01/2017 12:00:00 AM680.5%
 
07/01/2006 12:00:00 AM680.5%
 
05/01/2016 12:00:00 AM680.5%
 
06/01/2015 12:00:00 AM680.5%
 
08/01/2009 12:00:00 AM680.5%
 
11/01/2012 12:00:00 AM680.5%
 
09/01/2016 12:00:00 AM680.5%
 
Other values (177)1203687.6%
 
2020-12-12T16:06:26.590056image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:26.665622image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length22
Median length22
Mean length22
Min length22

Overview of Unicode Properties

Unique unicode characters15
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
010023233.2%
 
14222814.0%
 
23087210.2%
 
/274729.1%
 
274729.1%
 
:274729.1%
 
A137364.5%
 
M137364.5%
 
727880.9%
 
427880.9%
 
927880.9%
 
827880.9%
 
627200.9%
 
527200.9%
 
323800.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number19230463.6%
 
Other Punctuation5494418.2%
 
Space Separator274729.1%
 
Uppercase Letter274729.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
010023252.1%
 
14222822.0%
 
23087216.1%
 
727881.4%
 
427881.4%
 
927881.4%
 
827881.4%
 
627201.4%
 
527201.4%
 
323801.2%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/2747250.0%
 
:2747250.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
27472100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A1373650.0%
 
M1373650.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common27472090.9%
 
Latin274729.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
010023236.5%
 
14222815.4%
 
23087211.2%
 
/2747210.0%
 
2747210.0%
 
:2747210.0%
 
727881.0%
 
427881.0%
 
927881.0%
 
827881.0%
 
627201.0%
 
527201.0%
 
323800.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
A1373650.0%
 
M1373650.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII302192100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
010023233.2%
 
14222814.0%
 
23087210.2%
 
/274729.1%
 
274729.1%
 
:274729.1%
 
A137364.5%
 
M137364.5%
 
727880.9%
 
427880.9%
 
927880.9%
 
827880.9%
 
627200.9%
 
527200.9%
 
323800.8%
 

County Code
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct68
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.5
Minimum0
Maximum67
Zeros202
Zeros (%)1.5%
Memory size107.4 KiB
2020-12-12T16:06:26.737183image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q116.75
median33.5
Q350.25
95-th percentile64
Maximum67
Range67
Interquartile range (IQR)33.5

Descriptive statistics

Standard deviation19.62850093
Coefficient of variation (CV)0.5859254009
Kurtosis-1.20051932
Mean33.5
Median Absolute Deviation (MAD)17
Skewness0
Sum460156
Variance385.2780488
MonotocityNot monotonic
2020-12-12T16:06:26.819253image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
632021.5%
 
502021.5%
 
22021.5%
 
102021.5%
 
182021.5%
 
262021.5%
 
342021.5%
 
422021.5%
 
582021.5%
 
552021.5%
 
662021.5%
 
32021.5%
 
112021.5%
 
192021.5%
 
272021.5%
 
352021.5%
 
652021.5%
 
572021.5%
 
492021.5%
 
412021.5%
 
332021.5%
 
252021.5%
 
172021.5%
 
92021.5%
 
12021.5%
 
Other values (43)868663.2%
 
ValueCountFrequency (%) 
02021.5%
 
12021.5%
 
22021.5%
 
32021.5%
 
42021.5%
 
52021.5%
 
62021.5%
 
72021.5%
 
82021.5%
 
92021.5%
 
ValueCountFrequency (%) 
672021.5%
 
662021.5%
 
652021.5%
 
642021.5%
 
632021.5%
 
622021.5%
 
612021.5%
 
602021.5%
 
592021.5%
 
582021.5%
 

FIPS County Code
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct68
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66.01470588
Minimum0
Maximum133
Zeros202
Zeros (%)1.5%
Memory size107.4 KiB
2020-12-12T16:06:26.904827image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q132.5
median66
Q399.5
95-th percentile127
Maximum133
Range133
Interquartile range (IQR)67

Descriptive statistics

Standard deviation39.23207813
Coefficient of variation (CV)0.5942930079
Kurtosis-1.203250816
Mean66.01470588
Median Absolute Deviation (MAD)34
Skewness0.002107033734
Sum906778
Variance1539.155954
MonotocityNot monotonic
2020-12-12T16:06:26.990901image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1272021.5%
 
512021.5%
 
32021.5%
 
112021.5%
 
192021.5%
 
272021.5%
 
352021.5%
 
432021.5%
 
592021.5%
 
1192021.5%
 
672021.5%
 
752021.5%
 
832021.5%
 
912021.5%
 
992021.5%
 
1072021.5%
 
1292021.5%
 
1212021.5%
 
1132021.5%
 
1052021.5%
 
972021.5%
 
892021.5%
 
812021.5%
 
732021.5%
 
652021.5%
 
Other values (43)868663.2%
 
ValueCountFrequency (%) 
02021.5%
 
12021.5%
 
32021.5%
 
52021.5%
 
72021.5%
 
92021.5%
 
112021.5%
 
132021.5%
 
152021.5%
 
172021.5%
 
ValueCountFrequency (%) 
1332021.5%
 
1312021.5%
 
1292021.5%
 
1272021.5%
 
1252021.5%
 
1232021.5%
 
1212021.5%
 
1192021.5%
 
1172021.5%
 
1152021.5%
 

County Name
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIFORM

Distinct68
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
Dauphin
 
202
Clinton
 
202
Washington
 
202
Forest
 
202
Bedford
 
202
Other values (63)
12726 
ValueCountFrequency (%) 
Dauphin2021.5%
 
Clinton2021.5%
 
Washington2021.5%
 
Forest2021.5%
 
Bedford2021.5%
 
York2021.5%
 
Susquehanna2021.5%
 
Franklin2021.5%
 
Montour2021.5%
 
Northampton2021.5%
 
Monroe2021.5%
 
Pike2021.5%
 
Chester2021.5%
 
Schuylkill2021.5%
 
Venango2021.5%
 
Warren2021.5%
 
Fulton2021.5%
 
Indiana2021.5%
 
Luzerne2021.5%
 
Perry2021.5%
 
Statewide2021.5%
 
Columbia2021.5%
 
Mercer2021.5%
 
Adams2021.5%
 
Lehigh2021.5%
 
Other values (43)868663.2%
 
2020-12-12T16:06:27.086484image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:27.164551image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length7
Mean length7.323529412
Min length3

Overview of Unicode Properties

Unique unicode characters45
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e1090810.8%
 
n1010010.0%
 
r90909.0%
 
a90909.0%
 
o66666.6%
 
i52525.2%
 
t48484.8%
 
l48484.8%
 
u30303.0%
 
d30303.0%
 
m26262.6%
 
s24242.4%
 
h24242.4%
 
C22222.2%
 
g20202.0%
 
y18181.8%
 
c16161.6%
 
k16161.6%
 
f16161.6%
 
B14141.4%
 
L14141.4%
 
M12121.2%
 
S12121.2%
 
b12121.2%
 
w10101.0%
 
Other values (20)78787.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter8665886.1%
 
Uppercase Letter1393813.9%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
C222215.9%
 
B141410.1%
 
L141410.1%
 
M12128.7%
 
S12128.7%
 
W10107.2%
 
P8085.8%
 
F8085.8%
 
A6064.3%
 
N4042.9%
 
D4042.9%
 
J4042.9%
 
E4042.9%
 
K2021.4%
 
Y2021.4%
 
G2021.4%
 
H2021.4%
 
T2021.4%
 
V2021.4%
 
I2021.4%
 
U2021.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1090812.6%
 
n1010011.7%
 
r909010.5%
 
a909010.5%
 
o66667.7%
 
i52526.1%
 
t48485.6%
 
l48485.6%
 
u30303.5%
 
d30303.5%
 
m26263.0%
 
s24242.8%
 
h24242.8%
 
g20202.3%
 
y18182.1%
 
c16161.9%
 
k16161.9%
 
f16161.9%
 
b12121.4%
 
w10101.2%
 
p6060.7%
 
v4040.5%
 
q2020.2%
 
z2020.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin100596100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1090810.8%
 
n1010010.0%
 
r90909.0%
 
a90909.0%
 
o66666.6%
 
i52525.2%
 
t48484.8%
 
l48484.8%
 
u30303.0%
 
d30303.0%
 
m26262.6%
 
s24242.4%
 
h24242.4%
 
C22222.2%
 
g20202.0%
 
y18181.8%
 
c16161.6%
 
k16161.6%
 
f16161.6%
 
B14141.4%
 
L14141.4%
 
M12121.2%
 
S12121.2%
 
b12121.2%
 
w10101.0%
 
Other values (20)78787.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII100596100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e1090810.8%
 
n1010010.0%
 
r90909.0%
 
a90909.0%
 
o66666.6%
 
i52525.2%
 
t48484.8%
 
l48484.8%
 
u30303.0%
 
d30303.0%
 
m26262.6%
 
s24242.4%
 
h24242.4%
 
C22222.2%
 
g20202.0%
 
y18181.8%
 
c16161.6%
 
k16161.6%
 
f16161.6%
 
B14141.4%
 
L14141.4%
 
M12121.2%
 
S12121.2%
 
b12121.2%
 
w10101.0%
 
Other values (20)78787.8%
 

MA Individuals
Real number (ℝ≥0)

HIGH CORRELATION

Distinct11567
Distinct (%)84.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66417.21469
Minimum880
Maximum2908088
Zeros0
Zeros (%)0.0%
Memory size107.4 KiB
2020-12-12T16:06:27.244119image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum880
5-th percentile2272.5
Q17673.5
median15248.5
Q335606.25
95-th percentile107811.5
Maximum2908088
Range2907208
Interquartile range (IQR)27932.75

Descriptive statistics

Standard deviation281568.7611
Coefficient of variation (CV)4.239394296
Kurtosis63.35478059
Mean66417.21469
Median Absolute Deviation (MAD)9900.5
Skewness7.784886354
Sum912306861
Variance7.928096724e+10
MonotocityNot monotonic
2020-12-12T16:06:27.325690image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
95880.1%
 
113670.1%
 
100970.1%
 
9596< 0.1%
 
9956< 0.1%
 
11106< 0.1%
 
80116< 0.1%
 
11165< 0.1%
 
10085< 0.1%
 
9115< 0.1%
 
9795< 0.1%
 
10045< 0.1%
 
32455< 0.1%
 
10065< 0.1%
 
67915< 0.1%
 
9405< 0.1%
 
11275< 0.1%
 
9455< 0.1%
 
105725< 0.1%
 
11215< 0.1%
 
9745< 0.1%
 
67314< 0.1%
 
78634< 0.1%
 
66174< 0.1%
 
11324< 0.1%
 
Other values (11542)1360499.0%
 
ValueCountFrequency (%) 
8801< 0.1%
 
8812< 0.1%
 
8831< 0.1%
 
8851< 0.1%
 
8871< 0.1%
 
8892< 0.1%
 
8943< 0.1%
 
8961< 0.1%
 
8971< 0.1%
 
8983< 0.1%
 
ValueCountFrequency (%) 
29080881< 0.1%
 
29080171< 0.1%
 
29069591< 0.1%
 
29026161< 0.1%
 
28948601< 0.1%
 
28936661< 0.1%
 
28920701< 0.1%
 
28887431< 0.1%
 
28879891< 0.1%
 
28862241< 0.1%
 

MA Children
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9826
Distinct (%)71.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31564.45166
Minimum317
Maximum1230401
Zeros0
Zeros (%)0.0%
Memory size107.4 KiB
2020-12-12T16:06:27.417268image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum317
5-th percentile1045
Q13543.75
median6763.5
Q315876
95-th percentile49019.5
Maximum1230401
Range1230084
Interquartile range (IQR)12332.25

Descriptive statistics

Standard deviation132269.139
Coefficient of variation (CV)4.190446279
Kurtosis58.39965152
Mean31564.45166
Median Absolute Deviation (MAD)4347
Skewness7.556646925
Sum433569308
Variance1.749512514e+10
MonotocityNot monotonic
2020-12-12T16:06:27.504343image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
41290.1%
 
33680.1%
 
407180.1%
 
40880.1%
 
401480.1%
 
452280.1%
 
406570.1%
 
321770.1%
 
40970.1%
 
347170.1%
 
343370.1%
 
356970.1%
 
33870.1%
 
41370.1%
 
584070.1%
 
347270.1%
 
34370.1%
 
541070.1%
 
318970.1%
 
5346< 0.1%
 
4196< 0.1%
 
3456< 0.1%
 
34796< 0.1%
 
40946< 0.1%
 
39736< 0.1%
 
Other values (9801)1356098.7%
 
ValueCountFrequency (%) 
3171< 0.1%
 
3201< 0.1%
 
3221< 0.1%
 
3232< 0.1%
 
3243< 0.1%
 
3252< 0.1%
 
3262< 0.1%
 
3271< 0.1%
 
3282< 0.1%
 
3293< 0.1%
 
ValueCountFrequency (%) 
12304011< 0.1%
 
12289251< 0.1%
 
12285921< 0.1%
 
12277171< 0.1%
 
12270771< 0.1%
 
12268861< 0.1%
 
12255411< 0.1%
 
12243421< 0.1%
 
12227781< 0.1%
 
12220641< 0.1%
 
Distinct17
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
2010-2011
 
816
2013-2014
 
816
2003-2004
 
816
2004-2005
 
816
2005-2006
 
816
Other values (12)
9656 
ValueCountFrequency (%) 
2010-20118165.9%
 
2013-20148165.9%
 
2003-20048165.9%
 
2004-20058165.9%
 
2005-20068165.9%
 
2015-20168165.9%
 
2012-20138165.9%
 
2016-20178165.9%
 
2011-20128165.9%
 
2018-20198165.9%
 
2009-20108165.9%
 
2014-20158165.9%
 
2007-20088165.9%
 
2006-20078165.9%
 
2017-20188165.9%
 
2008-20098165.9%
 
2019-20206805.0%
 
2020-12-12T16:06:27.592419image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:27.661979image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length9
Mean length9
Min length9

Overview of Unicode Properties

Unique unicode characters11
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
04039232.7%
 
22978424.1%
 
11781614.4%
 
-1373611.1%
 
432642.6%
 
532642.6%
 
632642.6%
 
732642.6%
 
832642.6%
 
931282.5%
 
324482.0%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number10988888.9%
 
Dash Punctuation1373611.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
04039236.8%
 
22978427.1%
 
11781616.2%
 
432643.0%
 
532643.0%
 
632643.0%
 
732643.0%
 
832643.0%
 
931282.8%
 
324482.2%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-13736100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common123624100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
04039232.7%
 
22978424.1%
 
11781614.4%
 
-1373611.1%
 
432642.6%
 
532642.6%
 
632642.6%
 
732642.6%
 
832642.6%
 
931282.5%
 
324482.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII123624100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
04039232.7%
 
22978424.1%
 
11781614.4%
 
-1373611.1%
 
432642.6%
 
532642.6%
 
632642.6%
 
732642.6%
 
832642.6%
 
931282.5%
 
324482.0%
 

Latitude
Real number (ℝ≥0)

Distinct68
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.79717159
Minimum39.346129
Maximum41.994138
Zeros0
Zeros (%)0.0%
Memory size107.4 KiB
2020-12-12T16:06:27.734041image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum39.346129
5-th percentile39.916579
Q140.328837
median40.793116
Q341.3355785
95-th percentile41.810371
Maximum41.994138
Range2.648009
Interquartile range (IQR)1.0067415

Descriptive statistics

Standard deviation0.6347239913
Coefficient of variation (CV)0.01555803911
Kurtosis-0.9701487931
Mean40.79717159
Median Absolute Deviation (MAD)0.524359
Skewness-0.008195782671
Sum560389.9489
Variance0.4028745452
MonotocityNot monotonic
2020-12-12T16:06:27.819114image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
41.1783642021.5%
 
40.4673552021.5%
 
40.2098992021.5%
 
41.1313912021.5%
 
40.0073752021.5%
 
41.2328632021.5%
 
40.993252021.5%
 
40.9193672021.5%
 
40.3675972021.5%
 
41.4034132021.5%
 
40.1910972021.5%
 
39.9165792021.5%
 
40.4912752021.5%
 
41.4388032021.5%
 
40.7075122021.5%
 
40.7711372021.5%
 
41.9941382021.5%
 
40.0044442021.5%
 
39.8720962021.5%
 
40.6127492021.5%
 
40.3350112021.5%
 
41.6496982021.5%
 
40.9193142021.5%
 
40.9108322021.5%
 
39.8548042021.5%
 
Other values (43)868663.2%
 
ValueCountFrequency (%) 
39.3461292021.5%
 
39.8548042021.5%
 
39.8720962021.5%
 
39.9165792021.5%
 
39.9194482021.5%
 
39.9219252021.5%
 
39.9248752021.5%
 
39.9278622021.5%
 
39.9714632021.5%
 
39.9748712021.5%
 
ValueCountFrequency (%) 
41.9941382021.5%
 
41.8205692021.5%
 
41.8167522021.5%
 
41.8103712021.5%
 
41.7911782021.5%
 
41.7733382021.5%
 
41.7442062021.5%
 
41.6854692021.5%
 
41.6496982021.5%
 
41.5189252021.5%
 

Longitude
Real number (ℝ)

Distinct68
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-77.58356299
Minimum-80.351074
Maximum-75.032709
Zeros0
Zeros (%)0.0%
Memory size107.4 KiB
2020-12-12T16:06:27.901185image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum-80.351074
5-th percentile-80.251801
Q1-79.04658525
median-77.337219
Q3-76.1720255
95-th percentile-75.167756
Maximum-75.032709
Range5.318365
Interquartile range (IQR)2.87455975

Descriptive statistics

Standard deviation1.654444087
Coefficient of variation (CV)-0.02132467269
Kurtosis-1.24461225
Mean-77.58356299
Median Absolute Deviation (MAD)1.3940845
Skewness-0.1432593023
Sum-1065687.821
Variance2.737185235
MonotocityNot monotonic
2020-12-12T16:06:27.990762image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-76.5147922021.5%
 
-76.7118842021.5%
 
-75.3126372021.5%
 
-78.4911652021.5%
 
-76.5182562021.5%
 
-80.2518012021.5%
 
-78.571222021.5%
 
-75.1402362021.5%
 
-77.6428382021.5%
 
-77.2222432021.5%
 
-75.372522021.5%
 
-76.4100222021.5%
 
-76.2513882021.5%
 
-78.114852021.5%
 
-75.1129122021.5%
 
-79.4248362021.5%
 
-76.7257612021.5%
 
-75.6009952021.5%
 
-77.9827662021.5%
 
-79.2785822021.5%
 
-80.0407592021.5%
 
-76.7796062021.5%
 
-79.0310022021.5%
 
-77.2590742021.5%
 
-79.9861982021.5%
 
Other values (43)868663.2%
 
ValueCountFrequency (%) 
-80.3510742021.5%
 
-80.3375412021.5%
 
-80.2600942021.5%
 
-80.2518012021.5%
 
-80.2294382021.5%
 
-80.1132112021.5%
 
-80.0407592021.5%
 
-79.9861982021.5%
 
-79.9171182021.5%
 
-79.7628662021.5%
 
ValueCountFrequency (%) 
-75.0327092021.5%
 
-75.1129122021.5%
 
-75.1402362021.5%
 
-75.1677562021.5%
 
-75.3051542021.5%
 
-75.3126372021.5%
 
-75.3408362021.5%
 
-75.372522021.5%
 
-75.4062772021.5%
 
-75.6009952021.5%
 

Georeferenced Latitude & Longitude
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIFORM

Distinct68
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size107.4 KiB
POINT (-75.406277 39.916579)
 
202
POINT (-77.257881 41.773338)
 
202
POINT (-75.600995 40.614648)
 
202
POINT (-79.76286600000002 41.40341300000001)
 
202
POINT (-78.718942 40.491275)
 
202
Other values (63)
12726 
ValueCountFrequency (%) 
POINT (-75.406277 39.916579)2021.5%
 
POINT (-77.257881 41.773338)2021.5%
 
POINT (-75.600995 40.614648)2021.5%
 
POINT (-79.76286600000002 41.40341300000001)2021.5%
 
POINT (-78.718942 40.491275)2021.5%
 
POINT (-78.209169 41.438803)2021.5%
 
POINT (-78.57122 41.810371)2021.5%
 
POINT (-80.040759 41.994138)2021.5%
 
POINT (-76.518256 41.791178)2021.5%
 
POINT (-78.11485 39.924875)2021.5%
 
POINT (-80.337541 40.99325)2021.5%
 
POINT (-77.642838 41.232863)2021.5%
 
POINT (-76.01813 41.518925)2021.5%
 
POINT (-75.312637 40.754595)2021.5%
 
POINT (-77.723988 39.927862)2021.5%
 
POINT (-75.802503 41.820569)2021.5%
 
POINT (-77.222243 39.872096)2021.5%
 
POINT (-80.351074 40.683492)2021.5%
 
POINT (-80.260094 41.302378)2021.5%
 
POINT (-76.223324 40.707512)2021.5%
 
POINT (-79.986198 40.467355)2021.5%
 
POINT (-77.620031 40.612749)2021.5%
 
POINT (-80.113211 41.685469)2021.5%
 
POINT (-77.069425 41.344598)2021.5%
 
POINT (-78.475583 41.000429)2021.5%
 
Other values (43)868663.2%
 
2020-12-12T16:06:28.086344image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-12T16:06:28.163911image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length44
Median length28
Mean length28.67647059
Min length27

Overview of Unicode Properties

Unique unicode characters20
Unique unicode categories7 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
0327248.3%
 
7313107.9%
 
4280787.1%
 
1278767.1%
 
274727.0%
 
.274727.0%
 
9226245.7%
 
5181804.6%
 
8177764.5%
 
3177764.5%
 
2165644.2%
 
6161604.1%
 
P137363.5%
 
O137363.5%
 
I137363.5%
 
N137363.5%
 
T137363.5%
 
(137363.5%
 
-137363.5%
 
)137363.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number22906858.2%
 
Uppercase Letter6868017.4%
 
Space Separator274727.0%
 
Other Punctuation274727.0%
 
Open Punctuation137363.5%
 
Dash Punctuation137363.5%
 
Close Punctuation137363.5%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P1373620.0%
 
O1373620.0%
 
I1373620.0%
 
N1373620.0%
 
T1373620.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
27472100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(13736100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-13736100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
03272414.3%
 
73131013.7%
 
42807812.3%
 
12787612.2%
 
9226249.9%
 
5181807.9%
 
8177767.8%
 
3177767.8%
 
2165647.2%
 
6161607.1%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.27472100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)13736100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common32522082.6%
 
Latin6868017.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
P1373620.0%
 
O1373620.0%
 
I1373620.0%
 
N1373620.0%
 
T1373620.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
03272410.1%
 
7313109.6%
 
4280788.6%
 
1278768.6%
 
274728.4%
 
.274728.4%
 
9226247.0%
 
5181805.6%
 
8177765.5%
 
3177765.5%
 
2165645.1%
 
6161605.0%
 
(137364.2%
 
-137364.2%
 
)137364.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII393900100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
0327248.3%
 
7313107.9%
 
4280787.1%
 
1278767.1%
 
274727.0%
 
.274727.0%
 
9226245.7%
 
5181804.6%
 
8177764.5%
 
3177764.5%
 
2165644.2%
 
6161604.1%
 
P137363.5%
 
O137363.5%
 
I137363.5%
 
N137363.5%
 
T137363.5%
 
(137363.5%
 
-137363.5%
 
)137363.5%
 

Interactions

2020-12-12T16:06:19.929325image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.020903image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.105977image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.195554image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.285131image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.375709image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.464785image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.551360image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.639436image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.723508image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.804077image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.886148image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:20.969220image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.053792image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.138364image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.218935image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.300505image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.392083image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.477157image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.562730image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.648304image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.737381image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.821953image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.906026image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:21.989598image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.075671image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.158243image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.243816image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.330391image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.418467image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.505041image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.587112image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.671685image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.761262image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.845334image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:22.933410image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.021486image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.109561image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.196636image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.280709image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.367283image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.456860image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.541933image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.627007image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.714082image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.800156image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.885229image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:23.967800image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.052873image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.136945image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.217014image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.299585image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.383157image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.467230image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.549801image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.629369image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.710939image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.796013image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.876582image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:24.959153image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:25.041724image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:25.124796image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:25.207867image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:25.286935image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2020-12-12T16:06:28.229968image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-12T16:06:28.348069image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-12T16:06:28.466171image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-12T16:06:28.588276image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-12-12T16:06:28.705377image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-12-12T16:06:25.472095image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2020-12-12T16:06:25.662258image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Sample

First rows

State NameFIPS State CodeCalendar YearCalendar MonthMonth NameDateCounty CodeFIPS County CodeCounty NameMA IndividualsMA ChildrenState Fiscal YearLatitudeLongitudeGeoreferenced Latitude & Longitude
0PA4220156June06/01/2015 12:00:00 AM1019Butler24112107882014-201540.910832-79.917118POINT (-79.917118 40.910832)
1PA42201310October10/01/2013 12:00:00 AM4283McKean914541772013-201441.810371-78.571220POINT (-78.57122 41.810371)
2PA42200610October10/01/2006 12:00:00 AM35Armstrong1219458402006-200740.815095-79.473169POINT (-79.473169 40.815095)
3PA4220107July07/01/2010 12:00:00 AM47Beaver30872152102010-201140.683492-80.351074POINT (-80.351074 40.683492)
4PA4220094April04/01/2009 12:00:00 AM00Statewide205916810433372008-200939.346129-75.167756POINT (-75.167756 39.346129)
5PA4220133March03/01/2013 12:00:00 AM53105Potter337415702012-201341.744206-77.898792POINT (-77.898792 41.744206)
6PA4220148August08/01/2014 12:00:00 AM2651Fayette35070151592014-201539.919448-79.651896POINT (-79.651896 39.919448)
7PA4220165May05/01/2016 12:00:00 AM2855Franklin27493132642015-201639.927862-77.723988POINT (-77.723988 39.927862)
8PA4220122February02/01/2012 12:00:00 AM3671Lancaster71597378912011-201240.045908-76.251388POINT (-76.251388 40.045908)
9PA42200310October10/01/2003 12:00:00 AM2855Franklin1214965252003-200439.927862-77.723988POINT (-77.723988 39.927862)

Last rows

State NameFIPS State CodeCalendar YearCalendar MonthMonth NameDateCounty CodeFIPS County CodeCounty NameMA IndividualsMA ChildrenState Fiscal YearLatitudeLongitudeGeoreferenced Latitude & Longitude
13726PA4220058August08/01/2005 12:00:00 AM4793Montour246810482005-200641.028018-76.664705POINT (-76.664705 41.028018)
13727PA4220064April04/01/2006 12:00:00 AM1835Clinton669530972005-200641.232863-77.642838POINT (-77.642838 41.232863)
13728PA42201011November11/01/2010 12:00:00 AM2345Delaware84298444852010-201139.916579-75.406277POINT (-75.406277 39.916579)
13729PA4220186June06/01/2018 12:00:00 AM64127Wayne1072343032017-201841.649698-75.305154POINT (-75.305154 41.649698)
13730PA42200912December12/01/2009 12:00:00 AM4079Luzerne58787297062009-201041.178364-75.991996POINT (-75.991996 41.178364)
13731PA4220173March03/01/2017 12:00:00 AM47Beaver36273147332016-201740.683492-80.351074POINT (-80.351074 40.683492)
13732PA4220076June06/01/2007 12:00:00 AM3773Lawrence1722282742006-200740.993250-80.337541POINT (-80.337541 40.99325)
13733PA4220179September09/01/2017 12:00:00 AM3671Lancaster92967435942017-201840.045908-76.251388POINT (-76.251388 40.045908)
13734PA4220158August08/01/2015 12:00:00 AM1325Carbon1192354522015-201640.919367-75.711070POINT (-75.71107 40.919367)
13735PA4220042February02/01/2004 12:00:00 AM3059Greene829738752003-200439.854804-80.229438POINT (-80.229438 39.854804)